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When playing games in groups, it is an advantage for individuals to have accurate statistical 
information on the strategies of their opponents. Such information may be obtained by remembering 
previous interactions. We consider a rock-scissors-paper game in which agents are able to recall their 
last m interactions, used to estimate the behaviour of their opponents. At critical memory length, 
a Hopf bifurcation leads to the formation of stable limit cycles. In a mixed population, agents 
with longer memories have an advantage, provided the system has a stable fixed point, and there is 
some asymmetry in the payoffs of the pure strategies. However, at a critical concentration of long 
memory agents, the appearance of limit cycles destroys their advantage. By introducing population 
dynamics that favours successful agents, we show that the system evolves toward the bifurcation 
point. 
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I. INTRODUCTION 

“The wise can learn from their enemies” [l|, and how 
they do this is an interesting question. The mathemat¬ 
ical analysis of competitions between opponents, called 
“the theory of games”, began with von Neumann Q and 
Nash Q. This early work concerned the equilibria of 
games, where no agent has anything to gain by changing 
strategy 0. The question of if, and how, game players 
reach an equilibrium has been the subject of a great deal 
of work since. The dynamics of games can have different 
driving mechanisms. In “evolutionary game theory” Q, 
devised by John Maynard Smith in 1973, agents (living 
organisms) whose strategies are effective against competi¬ 
tors survive and reproduce more rapidly. This replicates 
their genes, and strategies they encode, in higher concen¬ 
tration in future generations. The “replicator equations” 
which describe this process have applications in diverse 
scientific settings from the evolution of language @ to 
behavioural dynamics and decision making Q. Alter¬ 
natively, death and reproduction need not be involved 
if agents are able to learn from experience. An early 
example of such a learning rule is “fictitious play” Q 
where players believe that their opponents are choos¬ 
ing strategies at random from a stationary distribution. 
They build a progressively clearer picture of this ficti¬ 
tious distribution through repeated interactions and at 
each round play the best response. The rule has an obvi¬ 
ous flaw in that the real distribution is non-stationary. 
An alternative, derived from psychology, is reinforce¬ 
ment learning [Oj of which an an example is “experience 
weighted attraction” 0- Here, actions that have proved 
successful in the past are played more frequently. In the 
context of cyclic competition, this can give rise to a wide 
range of competitive and cooperative behaviours, includ¬ 
ing quasiperiodicity, limit cycles, intermittency and chaos 
|lll) . If agents learn by sampling a finite number of their 
opponents’ moves between strategy updates, the noise 
inherent in these samples has been shown [l^l to lead 
to noise-sustained cycling or removal of periodic orbits 


present in the limit of infinite sample size. The processes 
of sampling in between strategy updates is referred to as 
“batch learning”. 

In this paper we introduce learning dynamics in which 
agents possess a simple form of finite memory for their 
previous interactions. They use this to predict the 
current best pure strategy, adjusting their probabilis¬ 
tic strategy after each new interaction. Each agent’s 
memory acts as a sample used for “online learning” as 
opposed to batch learning, and this distinction is cen¬ 
tral to the effects we uncover. We use our learning rule 
to investigate the children’s game of rock-scissors-paper 
where rock blunts scissors, scissors cut paper, and pa¬ 
per wraps rock. The three strategies cyclically dominate 
each other [l3l| . a situation which can arise in nature. 
For example, male side-blotched lizards 0 adopt one 
of three mating strategies: ultra-dominant with a large 
territory, mate guarding in a small territory or sneaker 
without territory, mating opportunistically. Dominant 
lizards beat guarders, but are vulnerable to sneakers, 
whereas guarders beat sneakers, creating a cyclical com¬ 
petition and oscillations in the frequencies of the three 
strategies. Cyclic competition can also arise in sociolog¬ 
ical contexts [0 . When the rock-scissors-paper game is 
studied using the replicator equations, the dynamics lack 
stable limit cycles. Depending on the values of payoffs 
for winning or losing, the system exhibits one of three 
kinds of behaviour; stable coexistence, neutrally stable 
cycles, or cycles of increasing amplitude [0. The first 
result we present is to show that in our memory based 
learning dynamics, limit cycles can form at critical mem¬ 
ory length via a Hopf bifurcation 0. The appearance 
of such cycles, also created by a Hopf bifurcation, has 
recently been discovered in the replicator equations, pro¬ 
vided that mutations from one strategy to another [ISj, 
or more complex patterns of mutation 0 are allowed. 
In contrast to this work, due to the memory present in 
our system, our dynamics is most naturally described by 
delay equations. 

The neurological mechanisms by which humans and 
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animals remember information are not completely un¬ 
derstood [ 2 ^ , but it is clear that there are multiple types 
of memory, and memory systems M- Broadly speaking, 
memories fall into two classes: they are either explicit 
recollections of events or facts - “declarative” memories 
- or learned motor or social skills and social conditioning 
[ 23 . Our agents are endowed with a simple, finite declar¬ 
ative memory for interactions, but our analysis could be 
repeated with other models of memory, or an empirical 
“forgetting curve” which describes how memory de¬ 
cays with time. Our current aim is to investigate how the 
length of an agent’s memory can influence their individ¬ 
ual effectiveness, and the dynamics of the game. In com¬ 
mon with earlier investigations of the use of sampling to 
determine strategy updates [H , we find that sample size 
(memory length), which determines the strength of noise 
in the data, has a powerful effect. After demonstrating 
the appearance of limit cycles, we investigate how re¬ 
duced noise aids decision making. Because each agent’s 
memory includes both recent and older behaviour, then 
strategy updates are made based on slightly out-of-date 
information. Provided that strategy adjustments are suf¬ 
ficiently small or, equivalently, memory is not too long, 
then the game has a stable fixed point, and agents with 
long memories fare better than their short memory coun¬ 
terparts; their prediction of the best strategy is subject to 
smaller random errors. However, the limit cycles which 
appear when long memory agents are in high enough con¬ 
centration destroy their competitive advantage. 


II. MODEL DEFINITION 

We study the zero sum rock-scissors-paper game, with 
payoff matrix 
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The game is played in continuous time by L > 2 agents 
who interact using random pairings which occur at rate 
Lj2 per unit time so that each agent experiences, on 
average, one interaction per unit time. Formally, the 
probability that a single pairing takes place in time St is 
L5tl2^-o(St). Agents each adopt a probabilistic strategy 
which, for agent * € {1,... L}, after the nth interaction is 
written [rj, Sj] „, where ri,Si and l — ri — Si are the proba¬ 
bilities of playing, respectively, rock, scissors or paper at 
the next interaction. Each agent is able to recall his last 
m interactions, producing a sample from the population 
of strategies {R, S'} where R and S are the numbers of 
rock and scissors interactions in his memory. Note that 
at any given time, both samples {R, S} and strategies 
[j’i, Si]„ require only two parameters for their description. 

Agents estimate the current average probability 
weights of their opponents as the fractions in their cur¬ 
rent sample, and use this to discern the optimal strategy. 



FIG. 1. Domains in which each of the three strategies are 
assessed to be the best. 


From the form of the payoff matrix o, we see that the 
optimal strategy is determined by which of the following 
domains the current sample lies 
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These domains are illustrated in Figure [T] At each point 
in time, each agent’s memory defines the position of a 
random walker in the simplex {(i?, S)\R-\- S < m}, with 
each new interaction determining the next step of the 
walk. After each step an agent will update his strategy 
according to the domain in which its memory lies, using 
the following rule 




= (1 


- e)[ri,Si\n + e < 


[1,0] if {R,S)€Vr 
[0,1] if (i?, S) e Vs 
[0,0] if {R,S) € Vp. 


(5) 

For certain combinations of payoffs and memory length, 
(i?, S) can lie on the boundary between two domains, in 
which case no update is made. The parameter e € [0,1], 
the “update rate”, describes the sensitivity of agents to 
new information, and n indexes the number of inter¬ 
actions. According to definition ([51), the current strat¬ 
egy is an exponential moving average of the strategies 
which were estimated to be optimal from past interac¬ 
tions. Our learning rule has some commonality with ex¬ 
perience weighted attraction rules used in recent studies 
II3) [I 3 where a parameter analogous with our e is used 
to describe how rapidly agents respond to new informa¬ 
tion. In these studies, the parameter is interpreted as a 
measure of memory in the learning process, because it 
determines how much weight is given to previous infor¬ 
mation. Using this interpretation, our model may be seen 
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FIG. 2. Probability weights of rock (open circles), scissors 
(dots) and paper (squares) agents in a group of size L = 100 
with a = 2, (1 = 7=1. All agents have memory m = 100 
and update rate e = 5 x 10“^. Dashed lines show solutions 
to delay equations ()28l) and (I29II using the same parameter 
values. 
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FIG. 3. Probability weights of rock (open circles), scissors 
(dots) and paper (squares) agents in a group of size L = 100 
with a = 2, /3 = 7 = 1. All agents have memory m = 100 and 
update rate e = 10“®. Dashed lines show solutions to delay 
equations (I28II and (|^ using the same parameter values. 


as containing two kinds of memory: an “active” memory - 
the recent sample - used for decision making, and a “pas¬ 
sive” memory which equally weights all previous samples 
in the limit e —>■ 0 and only uses the most recent when 
e = 1. We note also that our rule, in common with fic¬ 
titious play Q, implicitly assumes that agent’s samples 
are drawn from a stationary distribution. 


III. SIMULATION RESULTS 
A. Single memory length 

We begin by considering the behaviour of a population 
of agents, all of whom have the same memory length and 
update rate. In order to explore the dynamics of the 
population, we observe the average probability weights 


associated with each strategy. For example, the weight 
associated with rock is 

( 6 ) 

^ i=l 

with s{t) and p(t) defined similarly. The evolution of 
the total probability weights in a population with up¬ 
date rate e = 5 x 10“^ and memory m = 100 is shown 
in Figured Initial oscillations decay, eventually leaving 
the system in a stable state. The strategies of individ¬ 
uals in the population track the curves in Figure [21 but 
with a greater stochastic component. In Figure|3]we have 
increased the rate to e = 10“^, and we see that stable os¬ 
cillations have formed. For given memory length, these 
appear when the update rate is sufficiently large. Con¬ 
versely, for given update rate, stable oscillations emerge 
when the memory length is sufficiently large. We will 
show later that it is the product em^ which must exceed 
a critical value for stable oscillations to appear. This 
transition from stable equilibrium to stable oscillations 
at critical parameter values is known as a “Hopf Bifur¬ 
cation” firt. 


B. Agents as statisticians 

The memory of each agent represents a sample from 
a time varying population, which is used to estimate 
the current properties of the population. It is the fact 
that the sample is not drawn from the current popula¬ 
tion which allows oscillations to form. Since the primitive 
method used by our agents to estimate strategy fractions 
plays a role in destabilizing the fixed point, it is interest¬ 
ing to compare it to more sophisticated methods in order 
to determine its plausibility as the choice of an agent with 
some degree of “common sense”, assuming he can recall 
the times at which the interactions in his sample took 
place. We do not formally address the question of which 
estimation technique would be selected by rational agents 
with unlimited intellectual resources. 

In the same way that the method of regression is used 
to estimate, by maximum likelihood, the values of con¬ 
tinuous explanatory variables given a sample of contin¬ 
uous response variables, so logistic regression [2^ pro¬ 
duces estimates when the response variables are discrete. 
The explanatory variables in our system are the collec¬ 
tive time varying strategies of the group, whereas the 
response variables are the sequences of observations of 
strategy types that each agent makes. In order to investi¬ 
gate how logistic regression compares against our agents’ 
naive approach, as a technique for estimating the current 
strategy weights, we will use it to estimate the current 
rock weight given a finite memory sample. 

Letting tik be the time at which agent i experiences his 
kth interaction back from the current time, and Ifi(i,k) 
be the indicator function that this interaction is with a 
rock agent, then the likelihood of his current memory will 
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be 


A := n (7) 

fe=i 

We now suppose that during the period of time covered 
by the agent’s memory, the rock probability weight may 
be expressed as a logistic function 

where /3o and /3i are constants. It is straightforward to 
carry out higher order regression where the exponent in 
the denominator is replaced with a polynomial of higher 
order. However, the length of time covered by agents’ 
memory will typically be significantly shorter than the 
period of oscillation, so that r{t) changes in an approx¬ 
imately linear way whilst their samples are being col¬ 
lected. For example the period of oscillation in Figure [3] 
is T Ri 1375 whereas the memory length is m = 100, and 
the greatest change in r(t) during a single time interval 
of length 100 is 0.016. We later demonstrate that the ra¬ 
tio of the period of stable oscillations to memory length 
obeys 


T 

— oc \/m, (9) 

m 

at the point where oscillations begin to form, so that in 
systems where agents have longer memories, these mem¬ 
ories cover a relatively shorter fraction of one oscillation. 

To determine the values of /3o, /3i which maximize the 
likelihood Ci we express the log likelihood as a function 
of these parameters: 
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This expression is then numerically maximized to find 
the most likely values of /3o and /3i which can then be 
used in equation ([81) to predict the current value of r{t). 
In Figure |4] we see the results of this method, applied 
to the memory of a single randomly chosen agent from 
the simulation in Figure [S] Also shown are the results 
of the primitive estimation method used by our agents. 
The primitive method appears considerably more effec¬ 
tive at accurately predicting the current weights. How¬ 
ever, a subtle phase shift may be perceived in the agent’s 
predictions relative to the true values. In order to in¬ 
vestigate this shift, and to compare the two methods 
in greater detail we construct a fictitious sequence of 
Bernoulli random variables with success prob¬ 

ability, p{k), which changes over the course of the se¬ 
quence: p{k) = Pq + {pi — po)k/m. We then use our two 
methods to find an estimate pi of the value of pi , for a 
very large number of example sequences, allowing us to 
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FIG. 4. Dashed line shows the rock weight in a group of size 
L = 100 with Q = 2,/l = 7 = 1. All agents have memory 
m — 100 and update rate e = 10~^. Open circles show bi¬ 
nomial logistic regression estimates of the rock fraction made 
using the memory of a single randomly selected agent at a 
sequence of regularly spaced intervals. Black dots show simi¬ 
lar estimates made by simply computing the fraction of rock 
interactions in the agent’s memory. 



FIG. 5. Given a sequence {Xk}^^i where m = 100, of 
Bernoulli random variables with changing success probability 
p{k) = po + {pi—po)k/m where po — 0.2, pi = 0.25. Black dots 
show the estimated probability density function of the logis¬ 
tic regression estimator pi, having mean 0.256 and root mean 
squared error 0.07. Open circles show the estimated proba¬ 
bility distribution of the number of successes in the sequence, 
serving as an alternative estimator for pi, having mean 0.225 
and root mean squared error 0.07. 


estimate the distribution of pi. In Figures O and [3] we 
have constructed distributions for pi in cases m = 100 
and TO = 20. It is clear that logistic regression produces 
estimates with much higher variance, but with smaller 
bias. Whilst the primitive method is the more efficient 
estimator, in the sense that the mean squared error in 
its predictions is smaller, its bias creates a systematic 
delay in its predictions. For larger memory values this 
bias will reduce, because the period of oscillation grows 
faster than the memory length, so the weights change by 
a smaller amount over the course of each agent’s memory. 
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FIG. 6. Given a sequence {Xk}^^i where m = 20, of 
Bernoulli random variables with changing success probabil¬ 
ity p{k) — po + {pi — po)k/m where po = 0.2,pi = 0.3. Light 
gray histogram gives estimated probability distribution of the 
logistic regression estimator pi, having mean 0.33 and root 
mean squared error 0.2. Dark gray histogram gives estimated 
probability distribution of the number of successes in the se¬ 
quence, having mean 0.25 and root mean squared error 0.1. 



FIG. 7. Populations of m = 10 (open circles) and m = 100 
(dots) agents in system of size L = 200 with update rate 
e = 0.01 initially composed of 199 short memory agents and a 
single long memory agent. Game parameters are a = 2, j3 = 
7 = 1 and population dynamics parameters are p — k = 10“^. 
Also shown in fraction of paper agents (black line), illustrating 
how the onset of oscillations coincides with stable equilibrium 
between memory lengths. 


C. Mixed memory and population dynamics 

We now consider the case of a mixed population con¬ 
taining two memory lengths, m G {10,100}. Intuitively 
we expect that agents with longer memories will be at 
an advantage because they are able to more accurately 
predict the optimal strategy; their estimates of oppo¬ 
nents’ strategies are subject to smaller errors. To inves¬ 
tigate this we introduce some simple population dynam¬ 
ics, based upon an exponential moving average of each 
agent’s payoffs. We let Vik be the payoff to agent i at his 


fcth interaction since the start of the game, and define his 
moving average, Vik, as follows 

Vi,k+i ■= (1 - p)vik + pvi,k+i- ( 12 ) 

At each pairwise interaction in the game, with proba¬ 
bility K <C 1 , the agent with the lower average payoff 
is replaced with a copy of the higher scoring agent and 
the total score of the pair shared equally between the 
original and its copy. In this way the total payoff in the 
population remains at zero. In Figure[3we show the evo¬ 
lution of the fractions of short and long memory agents 
in such a simulation, along with the probability weight 
for paper, as a signifier for the presence of oscillations. 
Initially we have a single long memory agent, whose de¬ 
scendants reproduce rapidly due to their enhanced abil¬ 
ity to determine the optimal strategy to play. However, 
once these long memory agents are in sufficient concen¬ 
tration, they create a limit cycle. The presence of this 
cycle gives short memory agents an advantage, because 
although their samples have a higher noise, they contain 
more recent data. In circumstances of rapid change, older 
data becomes irrelevant, and skews the sample of the long 
memory agents, leading them to make poorer choices. In 
consequence the effectiveness of the two memory lengths 
comes into balance. The population sizes undergo noisy 
oscillations about the bifurcation point once their collec¬ 
tive strategies begin to oscillate. This indicates that the 
bifurcation point is itself a self organized state, driven by 
population dynamics. 

The fact that the payoffs of the three pure strategies 
are not equal is essential in order for long memory play¬ 
ers to have an advantage. Simulations similar to that 
illustrated in Figure [7] show that as the payoffs a, /?, 7 
approach equality, the long memory players do not pros¬ 
per and the self organizing Hopf bifurcation is no longer 
present. An intuitive understanding of this may be ob¬ 
tained by considering the effect of shortening the memory 
length, which causes individual agents’ samples to per¬ 
form higher variance random walks around the simplex of 
Figure [TJ As the variance of the walk increases, agents’ 
predictions of the best strategy are subject to greater 
variability, driving their strategies toward the symmetric 
case [^, ^]. If the game is not symmetric then a small 
shift towards the symmetric strategy for short memory 
players gives an advantage for longer memory players. In 
the symmetric case, this advantage disappears because 
the symmetric strategy is optimal. 

IV. THEORY 

Here we investigate the dynamics of a system of identi¬ 
cal agents as m —^ 00 and L —>■ 00 . The results we obtain 
by considering these limits provide excellent approxima¬ 
tions to the behaviour of smaller groups of agents for a 
wide range of memory values. In particular we study the 
symmetric game, allowing us to discover the analytical 
conditions for stability and the period of oscillation. 
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A. Individual estimates of optimal strategy 


In order to construct equations which describe the dy¬ 
namics of the strategy weights r{t), s(t),p{t), we require 
expressions for the probabilities that randomly selected 
agents will perceive each of the three strategies as opti¬ 
mal. If the strategy weights are constant then the num¬ 
bers R, S and P = m—R—S of strategy types in a sample 
of size m will be trinomially distributed with probability 
mass function 


f{x,y;r,s) 


mlr^sy{l -r- 

x\y\(m — X — y)\ 


(13) 


In reality, the weights will change while the sample is be¬ 
ing taken, producing a sequence of non-identical trivari¬ 
ate Bernoulli trials. Our aim is to approximate this non¬ 
stationary distribution with the stationary version (USD 
having appropriately chosen values of r and s. We will 
discover, once our analysis is complete, that the period, 
T, of oscillations at the transition to instability satis¬ 
fies T oc . Letting the magnitude of the change in 
r(t) over a typical agent’s sample be Sr then we see that 
Sr = as m —>■ oo, with the same behaviour 

holding for s(t). Thus the approximation of the sample 
distribution with a stationary distribution (USD, becomes 
increasingly accurate as m becomes large and our ap¬ 
proximation is self consistent. 

To calculate the probabilities that each of the strate¬ 
gies uj £ {Rock, Scissors, Paper} will appear optimal to 
a randomly selected agent, given recent behaviour of r(t) 
and s{t) we define indicator functions for the domains 
defined in (|^J) , (|3j) and o 


UR, S) 


1 if (i?, S) e 
0 otherwise. 


(14) 


In the limit L ^ oo the interactions of an individ¬ 
ual agent have negligible effect on the weights r{t),s{t), 
which evolve deterministically. If we select an agent at 
random at time t, then conditional on the history of the 
weights {r{u), s(u)|m < t}, the probability that this agent 
will perceive uj as optimal is a functional of this history 

p^[r,s](t) :=E[J^(R,5)](t). (15) 

Here the expectation is taken over the set of interaction 
times and types in the agent’s memory. Letting be the 
time elapsed since the /cth interaction back from time t, 
we define the following average 

(r)m := — ^r(t-Tfc), (16) 

k=l 

which is a random variable depending on {rkU^i- Con¬ 
ditional on the interaction times, we approximate the 
non stationary distribution of types with the stationary 
distribution f{x,y, {r)m, {s)m)- We note that differences 
between second moments of this distribution and the true 


distribution will be of order Sr and Ss, and the means 
will be identical. In order to average over the interaction 
times, we note that for a particular agent, the time inter¬ 
vals between interactions are exponentially distributed. 
The distribution function for is therefore the Gamma 
density 


P(Tfe G [t,t + St]) = ■= "fkiUi 


(17) 


so that 


2 poo 

'E[(r)m] = — / lk{u)r{t - u)dt 


k=l 
poo 
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Ikiu) 
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r{t — u)dij 


\k^l 

T(m, u) , , , 

-;— -r(t — u)du 

i' 


ml 


gni{u)r{t - u)ds 


(18) 

(19) 

( 20 ) 
( 21 ) 


where T is the incomplete gamma function, defined as 

poo 

r{m,u) = / x'^~^e~^dx. ( 22 ) 

J U 


Equation (l?T]) defines a probability density 5 m (s) hav¬ 
ing the shape of a smoothed top hat (uniform) distribu¬ 
tion on [0, m], and may be thought of as representing the 
strength of the collective memory of all agents u time 
units before the present. As m —> oo, the distribution 
of {r)m becomes increasingly sharply peaked about its 
mean so 


E[/(x, 5 ; (r)m, (s)m)] ~ /(x, y; E[(r)m],E[(s)m]) (23) 

where the expectation is taken over interaction times. 
Defining fm := E[(r)m] then: 

m m 

Pu,[r,s]{t) - EE f {x,y',rm, Sra)Iu){x,y) (24) 

3,-0 y—0 

PuiXjn, ^m) (^^) 

as m —>■ oo. Notice that the quantity (fm, Sm) is a func¬ 
tional only because the time averages fm,Sm are func¬ 
tionals, whereas Puii', ■) is an ordinary function. 


B. Delay equation 

During a short finite time interval [t, t + St], given the 
history {r(u), s{u)]u < t}, from the learning rule ([5D the 
expected changes in weights will be 

E[(5r] = e[p_R(fm, Sm) - r{t)]St (26) 

E[(5s] = e[ps(fm, Sm) - s(t)](5t. (27) 
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As L —>■ oo, these changes become deterministic and the 
weights obey the following coupled delay equations 


dr 

dt 

ds 

dt 


e[pfl(?'m,Sm) - r] 

(28) 

4Ps{rm,Sm) - S]- 

(29) 


Solutions to these equations are shown in Figures [2] and [H 
where we see that they accurately capture the simulated 
evolution of a system with the same parameter values. 


C. Linear Stability Analysis for Symmetric Case 


In order to understand the conditions under which sta¬ 
ble oscillations form, we examine the stability of the fixed 
point of equations (051) and (051) . By considering the sys¬ 
tem in the symmetric case where a = /3 = 7 = 1 , we 
can derive analytical results which provide a qualitative 
understanding of the system in general. The fixed point 
of the system in the symmetric case is (r, s) = (|, i). 

As m —>■ 00 , the trinomial distribution m becomes 
increasin gly w ell approximated by a bivariate normal dis¬ 
tribution |24| having the same means hr = mr, hs = fns, 
variances cr^ = mr(l — r),ag = ms(l —s) and correlation 
p = — {mrs)/{aRas)- By defining the parameter, 


{x-hrY 2p{x - HR){y - hs) , {v - d-sf 

z = - 2 -^- 1 - 2 - 

(7^ (TilCTS O'Q 

we can write this bivariate normal density as: 


(30) 


(j){x, y)dxdy 


exp 


2(l-p2) 


2TTaRasy'l - 


dxdy. 


(31) 


We can derive analytical approximations for the function¬ 
als PR{fm, Sm) and PR{fm, Sm), defined in (gJl) and dH]), 
by integrating this density over the domains In the 
symmetric case this task may be achieved by introducing 
new variables [l^ {X,Y), related to {x,y) as follows 


m Y 

"= 3 +^- 7 ! 

m 2Y 


(32) 

(33) 


We write the transformed density (piX, Y;r,s) so that 
(j){x,y)dxdy = (j){X,Y;r,s)dXdY. Integrals of the den¬ 
sity over the domains are tractable if the population 
fractions are (r, s) = ( 5 ,^) peak of the bi¬ 

variate normal coincides with the vertex where the do¬ 
mains meet (see Figure [ij, which is also the fixed point 
of the system in the symmetric case. We can exploit this 
tractability by expressing (p^X, Y]r,s) as a perturbation 
of (p{X, T; 7 ^)- We first write the population fractions 


as perturbations about the fixed point 


l+A 

(34) 

l + i>. 

(35) 


where ipr and tps are small fluctuations. We then define 
the following ratio 


h{X,Y]'ipr,’Ps) 


p{X,Y]\+1pr,\+1ps) 

P{X,Y-\,\) 


(36) 


which tends to unity as ipr^'p’s —t 0. The first Taylor 
polynomial of h about ' 0 T- = V's = 0 is 


^ ^ QX (m - 73T) 

hi{X,Y;ipr-,1ps) = IH-- -'pr 

m 

6 to iX + ViY) - 9X2 _ q^xY + 9^2 

+ - 


(37) 


The density p{X, Y;r,s) therefore has the following 
asymptotic behaviour, as tps^'P’r ^ 0 

p{X,Y;r,s) ^ hi{X,Y;p;r,ips)— -. (38) 

mir 

We now transform to polar coordinates X = u cos 6,Y = 
u sin 0 , and make use of the fact that in these coordinates 
the domains are symmetrical with angular width ^. For 
example Vr = {(u, 9)\9 G [0, 27r/3]}. By integrating over 
these domains and defining time averaged fluctuations: 

poo 

tpr{t) := / - u) (39) 

Jo 

we obtain the following linearized expressions for pr and 
PS- 


PRidPrAs) 


Psilpr^-Ps) 


1 

3 

1 

3 






3(2a 


' + a/s) 


Stt 


'Ps- 


(40) 

(41) 


We can verify the quality of these approximations by 
comparing the expansion coefficients to the exact values 
of the derivatives oipi^{r,s) evaluated at (r, s) = (^ 7 ); 
as shown in Figure [5] The linearized delay equations (1^ 
and (I29|) may then be expressed in terms of the expansion 
coefficients: 


a{m) 

b{m) 


c(m) 


3{2^/mTT — \/3) 
Stt 

3y/rn 

2v5r 

3(2i/m7r -|- v^) 
Stt 


(42) 

(43) 

(44) 
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identical up to a phase shift. We now write \ = x -\- iy 
and introduce the constant 



0 20 40 60 80 100 


m 

FIG. 8. The derivatives evaluated di- 

_ or ^ as ’ as ’ ar 

rectly from equation (l24l) at {fm,Sm) = (I,!)- The four 
derivatives are plotted using open circles, filled circles, open 
squares, filled squares respectively. Black lines show corre¬ 
sponding expansion coefficients from equations dig and (HU- 


as follows 

[aiprit — u) + blpsit — m)] gm{u)du — Ipr 

(45) 

[biprit - u) + C1ps{t - u)] gmiu)du - tps 

(46) 

As m —oo the weight function gm(,u) approaches a step 
function 




ffm(u) 


^ if u e [0, m] 
0 otherwise 


(47) 


and the ratios 2 b{^) > 2 b(m) ^ + 0{m ^/^) so equations 

(l4in) and (l46ll approach the following simplified form: 

^ - f [tpriu) + 2'lps{u)]du-tpr (48) 

f [2'ipr{u)+'ips(u)]du-ips. (49) 

e 4 VTO 7 r 

The symmetry of the system, together with numerical so¬ 
lutions to the full equations, suggests we search for oscil¬ 
latory solutions which differ in phase by 27r/3. We there¬ 
fore make the anstaz tprit) = and 'i/'s(0 = 6^ P’rit), 
where A is a complex number. Substitution of these trial 
solutions into (1481) and dig yields two identical charac¬ 
teristic equations 


A2 + eA - (1 - e-^™) = 0. (50) 

4-^7rTO 

The fact that both characteristic equations are identical 
justifies our ansatz that pairs of solutions exist which are 


A 


3 13 

4 V 7r 


(51) 


The real and imaginary parts of the characteristic equa¬ 
tion may then be written 


x^-y^ + ex + sin(m?/) = 0 

y/m 

2xy + ey -4=[1 — 6“™“ cos(my)] = 0. 

vm 


(52) 

(53) 


For given memory length m, provided that e is sufficiently 
small, the real part of the solutions to (15^) and (l53l) will 
be negative, so the fixed point is stable. As we increase 
e, then A crosses through the imaginary axis, creating a 
switch to instability with oscillations of exponentially in¬ 
creasing magnitude. Although the fixed point of the full 
dynamics shares this transition to instability, the result¬ 
ing oscillations are bounded, creating a stable limit cycle, 
the appearance of which is termed a “Hopf Bifurcation” 
[i3- To compute the critical value of the update rate Cc 
we set a: = 0 in equation (1551) obtaining 


1 - cosjmy) ^ 1 

my Ay/m 

This equation may have multiple roots, corresponding to 
different frequencies of oscillation. We may determine 
the asymptotic behaviour as m —>■ oo of the lowest root 
be expanding the left hand side to linear order about 
?/ = 0, and then solving for y, obtaining 


y 


8 [tt _ 3 

XA/TT™ ^ as m —>■ oo. 

3V 3 


(55) 


Substitution of this result into (l52l) again with x = 0 
yields 


2567r3/2csc(^^^ 

8lV3m5/2 


327r 

21m? 


as m —>■ oo. 


(56) 


From this analysis we see that the stability of the fixed 
point in the symmetric case depends of the value of the 
product em?. Our asymptotic expression for y gives us 
the period of oscillation at the Hopf bifurcation point 



(57) 


We now test these predictions numerically, and explore 
the non-symmetric case. 


D. Numerical tests of stability 

We consider the symmetric case hrst, both by numeri¬ 
cally solving equations (1551) and (1551) where the probabili¬ 
ties pr and PS are as summations (1151) over the trinomial 
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FIG. 9. Bold line shows r{t) from the numerical solutions to 
equations (Hi) and Pi in the symmetric case when m = 100 
and e = 3 X 10“^. Thin line show corresponding solution when 
e = 5 X 10“"^. Open circles and black dots show corresponding 
simulation results in a system of size L = 100. 



FIG. 10. Dependence of the steady state amplitude of r{t) 
on e for m = 20 (open markers) and m = 80 (filled mark¬ 
ers). Three different combinations of payoff values (a, (1,7) 
were used: (1,1,1) [circles], (1.5, 2.5,1) [squares] and (2,1,1) 
[triangles]. The vertical dotted line has horizontal coordinate 
327r/27, corresponding to the critical value of in the sym¬ 
metric game. L = 500 in all cases. 

distribution, and by simulation. We consider the case 
TO = 100 and in Figure Owe have plotted r{t) for values 
of e lying just above and just below the critical value of 
Cc = 27r/16875 ~ 3.7 x 10“"^ predicted by our stability 
analysis. The appearance of stable oscillations is con¬ 
sistent with our analysis. We also numerically compute 
the period of oscillation at the critical point, finding that 
T « 2347 which compares to our analytical estimate of 
T = 740v^ = 2302. 

We now verify, using simulations, that the appearance 


of limit cycles depends on the value of the product cto^ 
in three representative games, each with different payoffs. 
For two different memory values, to G {20,80} we have 
numerically determined the amplitude of oscillations in 
r{t) for a series of values of e. These amplitudes are plot¬ 
ted in Figure [Tolas functions of em^, and we see that the 
value CTO^ effectively predicts the onset of limit cycles at 
least at the levels of payoff asymmetry we have studied in 
this paper. We note however that we have found the pre¬ 
cise critical value of this product only for the symmetric 
case in the limit of large to. From Figure [10] we observe 
that the asymmetric payoffs introduce small corrections 
to this critical value. 


V. CONCLUSION 

We have studied the rock-scissors-paper game played 
by agents with a simple form of memory. This memory is 
used by each agent to estimate the current best strategy. 
After each new interaction, agents incrementally update 
their own strategy, using a form of online learning. The 
naive technique for estimating strategy fractions used by 
our agents has, in common with fictitious play the un¬ 
derlying assumption of a stationary distribution of agent 
strategies in the population. Although this assumption 
is clearly false, we have shown that the technique can act 
as a more efficient estimator than logit regression. 

Provided the system possesses a stable hxed point, 
agents with longer memories are able to more accurately 
determine the true weights, and therefore make better 
judgements about which strategy to play. However, ex¬ 
cessively long agent memory produces a transition from 
stable equilibrium to a limit cycle. We have shown an¬ 
alytically in the symmetric case that the fixed point is 
destabilized when ern? reaches a critical value, and that 
the period of oscillations at the transition point grows 
as TO^/^. A simple form of population dynamics, im¬ 
posed on a mixed population of long and short memory 
agents demonstrates that the initial advantage afforded 
long memory agents is destroyed when they become too 
numerous and destabilize the system. 

Due to its role as the simplest model of cyclic com¬ 
petition, the rock-scissors-paper game is heavily studied. 
Recent work Bin demonstrates how the introduction 
of mutations in to the replicator dynamics of the game 
can produce simple limit cycles via a Hopf bifurcation 
[13 in the replicator equations. We have shown that 
stable limit cycles can appear at critical agent memory, 
through a Hopf bifurcation in the delay equations which 
capture the learning dynamics of the population. 
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